Picking single-nucleotide polymorphisms in forests
نویسندگان
چکیده
With the development of high-throughput single-nucleotide polymorphism (SNP) technologies, the vast number of SNPs in smaller samples poses a challenge to the application of classical statistical procedures. A possible solution is to use a two-stage approach for case-control data in which, in the first stage, a screening test selects a small number of SNPs for further analysis. The second stage then estimates the effects of the selected variables using logistic regression (logReg). Here, we introduce a novel approach in which the selection of SNPs is based on the permutation importance estimated by random forests (RFs). For this, we used the simulated data provided for the Genetic Analysis Workshop 15 without knowledge of the true model.The data set was randomly split into a first and a second data set. In the first stage, RFs were grown to pre-select the 37 most important variables, and these were reduced to 32 variables by haplotype tagging. In the second stage, we estimated parameters using logReg.The highest effect estimates were obtained for five simulated loci. We detected smoking, gender, and the parental DR alleles as covariates. After correction for multiple testing, we identified two out of four genes simulated with a direct effect on rheumatoid arthritis risk and all covariates without any false positive.We showed that a two-staged approach with a screening of SNPs by RFs is suitable to detect candidate SNPs in genome-wide association studies for complex diseases.
منابع مشابه
Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points
Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...
متن کاملAssociation of two single nucleotide polymorphisms rs10407022 and rs3741664 with the risk of primary ovarian insufficiency in a sample of Iraqi women
Primary ovarian insufficiency (POI) can be a devastating disease impacting women below the age of forty. This involves a major decrease in the amount and quality of oocytes, or ovarian reserve in a woman. The distribution of single-nucleotide polymorphisms, rs10407022 and rs3741664, in Iraqi people and its association with primary ovarian insufficiency is the main objective of this study. The m...
متن کاملIn-silico study to identify the pathogenic single nucleotide polymorphisms in the coding region of CDKN2A gene
Background: CDKN2A, encoding two important tumor suppressor proteins p16 and p14, is a tumor suppressor gene. Mutations in this gene and subsequently the defect in p16 and p14 proteins lead to the downregulation of RB1/p53 and cancer malignancy. To identify the structural and functional effects of mutations, various powerful bioinformatics tools are available. The aim of this study is the ident...
متن کاملThe Single Nucleotide Polymorphisms in the C-reactive Protein Gene: are they Biomarkers of Cardiovascular Risk?
Recent pre-clinical and clinical studies have revealed the C-reactive protein gene (CRP) is related to the degree of acute rise in plasma C-reactive protein (CRP) levels. Moreover, single nucleotide polymorphisms (SNPs) in the CRP gene could associate with increased risk of cancer, atherosclerosis, diabetes mellitus, bowel disease, rheumatoid arthritis, psoriasis, obstructive pulmonary disease,...
متن کاملNo association between single nucleotide polymorphisms in pre-mirnas and the risk of gastric cancer in Chinese population
Objective(s): Accumulating evidence has demonstrated that miRNAs contribute to various genetic and epigenetic modifications in the pathogenesis of gastric cancer (GC). Recent studies focused on the four single nucleotide polymorphisms (SNPs) of pre-miRNAs including rs11614913, rs3746444, rs2910164, and rs2292832. It was suggested that these four SNPs were significantly associated with the risk ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- BMC Proceedings
دوره 1 شماره
صفحات -
تاریخ انتشار 2007